Cytometry Part A — Latest Matching Preprints

1

CyStainer: A transformer-based variational autoencoder for robust marker imputation in high-parameter cytometry

Ivanov, K.; Moussawy, M. A.; Kirk, F.; Samuli, R.; Lohi, O.; Olsen, L.; Modvig, S.; Hautamäki, V.; Heinäniemi, M.

2026-06-30 bioinformatics 10.64898/2026.06.30.735235 medRxiv

Top 0.1%

31.6%

Show abstract

High parameter cytometry is essential for clinical diagnostics through precise immune cell profiling, improved patient stratification, and monitoring, while also enhancing the understanding of cellular responses in disease and therapeutic contexts. The amount of cytometry data is growing fast, and with that, the need to merge different datasets for unified analysis. Here, we present CyStainer, a transformer-based variational autoencoder that demonstrates competitive or superior performance to existing methods on several key tasks related to marker prediction. As a key novelty, we demonstrate that CyStainer can impute markers without having a set of shared backbone markers. We performed several benchmarks using real-world FACS, CyTOF, InfinityFlow and CITE-seq datasets to show that CyStainer is a robust and flexible tool for panel merging, marker imputation, dataset integration and virtual staining of unseen samples.

2

Developing Buoyant-Analyte-Magnetic (BAM) Assays for Ultrasensitive Yet Rapid Point-of-Care Detection

Wang, C.; Satterfield, E.; Erwin, N.; Correa, J.; Wampler, W.; Dean, D.; Moschella, P.; Anker, J.

2026-06-26 emergency medicine 10.64898/2026.06.15.26355555 medRxiv

Top 0.1%

7.0%

Show abstract

Rapidly detecting infectious diseases such as Covid-19 is essential to control outbreaks and treat patients early. However, no available screening method combines low cost, portability, speed (<20 min, ideally <5 min), and ultrasensitivity (e.g., <1 virus/L): lateral flow assays are fast, portable, and inexpensive but insensitive, whereas ultrasensitive assays require centralized labs with long turnaround times. We recently developed an ultrasensitive immunoassay that captures, separates, and counts saliva biomarker molecules using buoyant microbubbles and magnetic microspheres, but the original assay took 55 minutes and was not readily deployable. Here, we redesigned the assay protocol and reader for emergency medicine and mobile care by streamlining the workflow, collecting saliva with larger swabs, filtering it through a 10 m cap, and using larger microbubbles to accelerate flotation. A paramedic successfully ran the assay on the back of a parked medical van in 3.5 minutes (spit-to-results) while achieving a 1.3 fg/mL analytical detection limit for SARS-CoV-2 nucleocapsid protein (~0.04 virus1/L). The assay remained positive across 9 orders of magnitude. We describe the challenges and opportunities ahead for point-of-care deployment.

3

Generative cell phenotyping with structured latent populations

Bodart, F.; De Voeght, A.; Baron, F.; Louppe, G.

2026-07-03 bioinformatics 10.64898/2026.06.30.735507 medRxiv

Top 0.1%

6.3%

Show abstract

Flow cytometry produces high-dimensional single-cell protein measurements central to immunophenotyping and clinical monitoring. Yet analysis still relies largely on manual gating, which is labour-intensive, poorly reproducible, and ill-suited to large marker panels. Existing computational approaches address classification or discovery in isolation, treating cell-type identity as a post-hoc annotation rather than as part of the generative model itself. We present MARVIN, a semi-supervised variational autoencoder that encodes the assumption that cells organise into discrete populations with continuous intra-population variability through a Gaussian mixture prior in the latent space. Because each component represents a distinct cell population, classification, discovery, and density estimation emerge as complementary views of the same representation. On public benchmarks, MARVIN matches or exceeds existing methods using as few as 10% labelled cells. Trained exclusively on healthy samples, it identifies leukaemic cells through elevated reconstruction error, providing an unsupervised anomaly detection signal. On paired stimulation data, it maintains stable population assignments while capturing condition-specific shifts in abundance and marker expression at patient-level resolution. MARVIN is open-source and designed for local deployment, adapting to institution-specific panels and instruments

4

Semi-automated reconstruction of glomerular architecture from 3D confocal microscopy data

Loyd, Y. M.; Chase, S. E.; Krendel, M.

2026-07-10 cell biology 10.64898/2026.07.03.736410 medRxiv

Top 0.1%

5.7%

Show abstract

Nephrons are the functional units of the kidney; within each nephron, the glomerulus is the initial site of selective filtration that allows removal of waste products while preserving proteins in the bloodstream. Each glomerulus consists of a network of capillaries surrounded by specialized epithelial cells, podocytes, which mediate selective filtration. Abnormalities in glomerular structure impair renal function, resulting in proteinuria and kidney disease. Although several microscopy-based approaches exist to characterize glomerular architecture and structural abnormalities, quantitative analysis is often limited by labor-intensive image segmentation. In this study we present a semi-automated approach for segmentation and analysis of glomerular architecture from three-dimensional confocal microscopy data. Using mTmG transgenic mice that express membrane-associated EGFP in podocytes and membrane-associated tdTomato across all other cell types, we reconstruct podocyte processes and glomerular capillaries from volumetric renal images. This semi-automated approach reduces manual segmentation effort and supports more efficient, standardized analysis of glomerular architecture in three-dimensional confocal microscopy datasets.

5

Hierarchical classification of hematologic malignancies using epigenetic and genetic information

Schönung, M.; Türe, M.; Lajer, P.; Renders, S.; Rausch, T.; Steinicke, T. L.; Dolnik, A.; Sträng, E.; Oak, M. S.; Heilmann, J.; Roth, K.; Katzenstein, L.; Rohde, C.; Sollier, E.; Horak, P.; Sauer, T.; Strefford, J. C.; Duran-Ferrer, M.; Oakes, C. C.; Martin-Subero, J. I.; Germing, U.; Dworzak, M.; Catala, A.; Flotho, C.; Niemeyer, C. M.; Döhner, H.; Hovestadt, V.; Fröhling, S.; Schlenk, R. F.; Heidel, F. H.; Korbel, J.; Gerhäuser, C.; Hartmann, M.; Müller-Tidow, C.; Lutsik, P.; Hundemer, M.; Erlacher, M.; Bullinger, L.; Plass, C.; Lipka, D. B.

2026-07-09 cancer biology 10.64898/2026.07.02.735835 medRxiv

Top 0.1%

4.3%

Show abstract

Molecular testing in hematology requires different assays for disease subgroup identification, risk stratification and selection of appropriate treatment regimens. Yet, molecular tests are not necessarily standardized between diagnostic laboratories, resulting in varying turnaround times and potentially divergent results. To resolve this issue and enable single-assay molecular testing, we have developed a hierarchical classification framework that combines epigenetic and genetic data from whole genome nanopore sequencing (WGNS) with machine learning to determine disease entities, epigenetic subgroups (epitypes) and genetic aberrations in hematopoietic neoplasms. We curated DNA methylation data from 5,420 samples and trained a classifier allowing entity-level diagnostics featuring 21 conditions, including healthy controls, acute and chronic myeloid and lymphoid neoplasms. This classifier was subsequently combined with entity-specific epitype classifiers predicting 44 therapeutically or prognostically relevant states, followed by integration of genetic data. Benchmarking of the combined (epi-)genetic testing strategy using WGNS confirmed high accuracy in the detection of diagnostic groups and risk stratification, and identified diagnosis-defining molecular alterations that were not reported by standard-of-care work-up.

6

A Universal Immune Index (II): A Composite Quantitative Assessment Method and Calculation Tool for Immune Function Based on Multidimensional Routine Laboratory Parameters

zhang, Y.; LI, K.

2026-06-25 allergy and immunology 10.64898/2026.06.22.26356269 medRxiv

Top 0.1%

2.6%

Show abstract

Background: Quantitative assessment of immune function is essential for clinical and health decisions in oncology, post-surgical management, and autoimmune diseases. Existing methods are either too simplistic (single indicators) or too complex and costly for routine use. A standardized, easy-to-operate tool based on routine laboratory parameters is needed for both clinical and health checkup settings. Methods: We propose the Immune Index (II), integrating 9 routine laboratory parameters across three dimensions: humoral immunity (IgG, complement C3, C4), cellular immunity (CD4+ T cells, CD8+ T cells, CD4+/CD8+ ratio), and inflammatory response (CRP, IL-6, systemic immune-inflammation index [SII]). Indicators were normalized using min-max normalization to a 0-100 scale and aggregated with fixed weights (humoral 30%, cellular 40%, inflammatory 30%). The II score ranges from 0 to 100, with a healthy reference range of 50-80. Results: A four-tier grading system was established: >=80 (immune overactivation), 50-80 (immune homeostasis), 35-50 (mild immune suppression), <35 (severe immune deficiency). Validation using 209 cases from published literature showed an AUC of 0.924 (95% CI: 0.87- 0.97) for distinguishing normal from abnormal immune status, with an optimal cutoff of 47.8 (sensitivity 84.8%, specificity 85.9%). II scores were 56.7+/-8.6 (healthy), 43.5+/-8.0 (immunodeficient), and 33.6+/-6.5 (autoimmune), with P<0.001 between all groups. The calculation requires only two steps and can be implemented in Excel or LIS. II can serve as an immune dimension supplement for personal health checkups. Conclusion: The Immune Index provides a simple, standardized, and low-cost tool for quantitative immune function assessment. The fixed-weight design ensures cross-institutional comparability, making it suitable for outpatient clinics, health checkup centers, and primary care settings. Keywords: Immune index; immune function; quantitative assessment; routine laboratory parameters; composite score; min-max normalization

7

Rapid immunostaining and high-resolution three-dimensional light-sheet microscopy of intact calcified tissues

Ding, Z.; Shi, Y.; Liu, H.; Li, C.; Chen, J.; Cohen-Solal, M.; Kusumbe, A. P.

2026-07-10 cell biology 10.64898/2026.07.04.736531 medRxiv

Top 0.2%

2.1%

Show abstract

High-resolution 3D imaging is an important strategy for visualizing and analysing complex skeletal tissue architecture and the bone marrow microenvironment. However, multicolor immunolabeling and imaging of intact skeletal tissues are technologically challenging. The current immunolabeling and clearing methods for intact skeletal elements are very limited, time-consuming and generate low-resolution data or depend on the use of reporter mice. Here, we describe a protocol for efficient clearing and immunolabeling of intact calcified tissues that enables superfast, single-cell resolution, and quantitative 3D light-sheet imaging of intact skeletal elements and teeth. A key aspect of our protocol is the addition of a collagenase digestion step after fixation and decalcification. This step enhances antibody penetration, resulting in deep, comprehensive staining throughout immunostained bones and other calcified tissues. The protocol includes soft tissue removal, fixation, decalcification, bone dehydration, and bleaching, followed by antigen retrieval and permeabilization before the collagenase digestion step. This procedure is performed to prepare the samples for the tissue clearing process that improves bone tissue transparency prior to light-sheet imaging. The entire protocol, from bone collection to image analysis and quantification, takes about 4 days to complete, thus offering significant improvements over previous methods. This protocol is broadly applicable to the visualization of bone microstructure, bone marrow analysis, vascular and neural network mapping, and the study of signaling molecules in bone development and growth. The protocol requires experience with standard tissue processing and immunostaining techniques, and prior experience in tissue clearing and light-sheet imaging is beneficial but not essential. Key pointsO_LIA protocol for efficient clearing and immunolabeling of intact calcified tissues that enables superfast, high-resolution, and quantitative 3D imaging of various intact bones and teeth. C_LIO_LIThe entire protocol takes only 4 days to complete the comprehensive staining and perfect transparency throughout the intact bones, offering significant improvements over previous methods. C_LI Key referencesBiswas, L. et al. Cell 186, 382-397.e24 (2023): https://doi.org/10.1016/j.cell.2022.12.031

8

MCD Stitcher: An open-source tool for whole-slide stitching and conversion of Imaging Mass Cytometry data

Chaurasia, P.

2026-07-01 bioinformatics 10.64898/2026.06.26.732348 medRxiv

Top 0.2%

2.1%

Show abstract

Imaging Mass Cytometry (IMC) combines metal-tagged antibody labelling with laser ablation mass spectrometry to generate highly multiplexed spatial images of tissue sections. However, the area that can be acquired within a single region of interest (ROI) is limited by hardware and software constraints, requiring large tissues to be imaged as multiple tiled ROIs. Reconstructing these ROIs into whole-slide images requires additional processing, while the proprietary .mcd file format can hinder integration with standard bioimage analysis workflows. Here, we present MCD Stitcher, an open-source Python package for converting .mcd files into OME-TIFF images with automated whole-slide stitching. The tool supports rectangular and polygonal ROIs, accommodates variable pixel sizes between ROIs, and uses memory-aware chunked reading during data ingestion to process large datasets on standard workstations. The generated OME-TIFF outputs preserve spatial, channel, and acquisition metadata for downstream analysis in tools such as QuPath, napari, and ImageJ/Fiji. MCD Stitcher provides a reproducible workflow for converting raw IMC data into interoperable image formats, enabling whole-slide spatial analysis without reliance on vendor-specific software.

9

SupeRJump: Determining normal and leukemic differentiation fate through semi-supervised jump diffusion modeling

Bowman, M.; Bandopadhyay, R.; Singh, V.; Telpoukhovskaia, M.; Vander Velde, R.; Shaffer, S. M.; Trowbridge, J. J.; Bowman, R. L.

2026-07-07 bioinformatics 10.64898/2026.07.01.735284 medRxiv

Top 0.2%

1.8%

Show abstract

Single cell RNA-seq (scRNA) has provided unprecedented resolution into cellular and clonal heterogeneity. Computational approaches have enabled recovery of differentiation dynamics, yet current approaches do not evaluate discontinuous differentiation processes present in malignant leukemia. To address these gaps, we developed SupeRJump: a jump-drift-diffusion based supervised cell-fate model (https://github.com/namwob44/SupeRJump/). We deploy this approach in human bone marrow, murine aging hematopoiesis, and lentivirally barcoded mouse models of acute myeloid leukemia. Our framework introduces a semi-supervised pseudotime strategy to fit a jump-drift-diffusion model and batch correction for lineage fate predictions from absorbing Markov chains. We introduce metrics to quantify cell skewness toward particular lineages, transitions through intermediate progenitor states toward terminally differentiated states, and discontinuous transition dynamics. We use these metrics to identify cells preferentially biased for differentiation, their underlying transcriptional networks, and gene programs responsible for differentiation discontinuity.

10

Automated Phenotypic Characterization in Rare Hematologic Malignancies Using a Large Language Model-Based Framework

Khan, M. A.; Ayub, U.; Jajja, S. A.; Anjum, M. U.; Warraich, K.; Jain, P.; Oberoi, J. K.; Al Abbas, M.; Sadiq, M. H.; Sarfraz, M. U.; Huang, Z.; Riaz, I. B.; Palmer, J. M.

2026-07-09 health informatics 10.64898/2026.06.26.26356633 medRxiv

Top 0.2%

1.5%

Show abstract

Background. Diagnosis and risk stratification in rare hematologic malignancies such as myeloproliferative neoplasms (MPNs) - polycythemia vera (PV), essential thrombocythemia (ET), and myelofibrosis (MF) - require expert review of longitudinal, heterogeneous clinical records. This process is cognitively demanding, inconsistently applied, and difficult to scale beyond tertiary centers. No automated phenotyping workflow currently exists for hematologic malignancies. Methods. A HIPAA-compliant large language model (LLM) framework for phenotyping MPN was developed to integrate (i) rule-based retrieval of bone marrow biopsy reports, clinical notes, and structured laboratory results from the electronic health record (EHR); (ii) zero-shot extraction of diagnostic and prognostic variables from unstructured text using GPT-4 Turbo; (iii) a clinician-informed source-prioritization algorithm to reconcile conflicting multi-source data; (iv) WHO/ICC-criteria-based diagnostic classification; and (v) NCCN-based risk stratification using the conventional risk model for PV, IPSET-thrombosis for ET, and DIPSS, DIPSS-plus, and MIPSS70/MIPSS70+ v2 for MF. Patients were identified via MPN-related ICD-9/10 codes; cases met 2017 WHO criteria or had a hematologist-documented diagnosis, and controls did not. The cohort was split into a prompt-development set (n = 60) and a held-out test set (n = 450; 75 cases and 75 controls per disease). Ground truth was established by independent dual-clinician chart review with consensus adjudication. LLM performance was evaluated against the ground truth: variable-level extraction using accuracy, F1 score, and Cohen's kappa; patient-level diagnostic classification using sensitivity, specificity, and Cohen's kappa; and prognostic risk stratification (among confirmed cases) using accuracy, weighted F1 score, and quadratic-weighted Cohen's kappa. Wilson 95% confidence intervals (CIs) were used for proportions and bootstrap 95% CIs with 500 resamples for F1 scores. Results. The held-out test set included 450 patients (PV: 150; ET: 150; MF: 150) with pathology reports and structured laboratory results, and 172 patients (PV: 52; ET: 55; MF: 65) with clinical notes. From pathology reports, overall variable extraction accuracy and F1 score were 99% (95% CI, 98-100) and 1.00 (0.99-1.00) for PV, 100% (99-100) and 0.99 (0.96-1.00) for ET, and 100% (99-100) and 0.99 (0.97-1.00) for MF. From clinical notes, overall accuracy and F1 score were 96% (91-100) and 0.94 (0.85-1.00) for PV, 100% (100-100) and 1.00 (1.00-1.00) for ET, and 100% (99-100) and 0.98 (0.95-1.00) for MF. Diagnostic sensitivity was 100% (95% CI, 95.1-100.0) for PV, ET, and MF; specificity was 98.7% (92.8-99.8) for PV and 100% (95.1-100.0) for both ET and MF, with Cohen's kappa of 0.99 for PV and 1.00 for ET and MF. Risk stratification accuracy was 100% with weighted F1 score of 1.00 and quadratic-weighted Cohen's kappa of 1.00 across all three diseases. A pre-specified source-ablation analysis showed that pathology reports alone were sufficient for diagnosis (sensitivity 98.7% for PV, 100% for ET, 96.0% for MF; specificity 100% across all three subtypes) but inadequate for prognostication (accuracy 69.3% for PV, 93.3% for ET, 77.3% for MF). Adding clinical notes to pathology reports recovered full prognostic accuracy of 100% across all three diseases. Conclusions. This first-in-class automated framework achieved expert-level performance for MPN diagnosis and risk stratification from real-world EHR data, establishing a foundation for scalable, standardized phenotyping in rare hematologic malignancies. Prospective, multi-site validation is warranted before clinical deployment.

11

Analytical Performance and 99th Percentile Upper Reference Limit of the Novel SPINCHIP High-Sensitivity Cardiac Troponin I Point-of-Care Assay

MacKenzie, J.; Aakre, K. M.; Paus, D.; Broughton, M. N.; Storvold, G. L.; Olberg, A.; Stenmark, S.; Booij, B. B.; Scott, S.; Michel-Busseret, S.; Octave, L.; Tveit, A.; Lyngbakken, M. N.; Nilsson, J.; Rosjo, H.

2026-07-20 emergency medicine 10.64898/2026.07.17.26357157 medRxiv

Top 0.2%

1.4%

Show abstract

BACKGROUND In line with International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) recommendations for high-sensitivity cardiac troponin assays, analytical validation and reference limit assessments are required to confirm that an assay meets performance criteria. This study evaluated the analytical performance and established the 99th percentile upper reference limit (URL) for the SPINCHIP High-Sensitivity Cardiac Troponin I (SPINCHIP hs-cTnI) point-of-care assay. METHODS Analytical performance characteristics, including the limit of blank (LoB), limit of detection (LoD), and limit of quantification (LoQ), were assessed. Additionally, 1,053 plasma samples and 1,055 whole-blood samples were used to determine the URL. Imprecision around the 99th percentile URL was evaluated as part of the analytical validation. High-sensitivity criteria were assessed by confirming measurable cTnI in [≥]50% of healthy individuals (n=432 plasma; n=431 whole blood) and achieving imprecision <10% at the 99th percentile (plasma, n=960; whole blood, n=480). RESULTS SPINCHIP hs-cTnI demonstrated a LoB of 0.3 ng/L; LoDs of 0.8 ng/L (plasma) and 0.9 ng/L (whole blood); and LoQs of 1.1 ng/L (plasma) and 1.4 ng/L (whole blood). The analytical measuring range was 1.1-9,000 ng/L. Imprecision at the common 99th percentile URL (14 ng/L) was 5.8%; for men (URL=16 ng/L) 5.6% and for women (URL=10 ng/L) 6.3%. Greater than 85.2% (94.0% and 76.1% in men and women, respectively) of healthy individuals showed measurable cTnI above the LoD. CONCLUSIONS The SPINCHIP hs-cTnI assay meets the IFCC high-sensitivity requirements, demonstrating <10% imprecision at the 99th percentile, reliable low-concentration precision and cTnI detection in more than half of healthy individuals.

12

HiExM Enables Scalable Mapping of Organelle Morphology and Spatial Heterogeneity

Day, J. H.; Farrell, J. D.; Yang, D.; Neira, F. N.; Allen, E. A.; Byrne, A. M.; Leksa, N. C.; Klinger, K. W.; de Nola, G.; Al-Jazrawe, M.; Boyer, L. A.

2026-07-14 cell biology 10.64898/2026.07.12.738053 medRxiv

Top 0.2%

1.3%

Show abstract

Quantitative image analysis of subcellular organization requires sufficient spatial resolution to resolve individual organelles and sample size to capture heterogeneity both within cells and between cells. Existing imaging approaches often force a tradeoff between spatial resolution and throughput, limiting the ability to measure organelle-level phenotypes across cell populations. Here, we establish high-throughputs expansion microscopy (HiExM) as a scalable pipeline for single-organelle analysis. As a benchmark, we focus on mapping late endosomes and lysosomes (LELs), a heterogeneous organelle class whose small size, dense intracellular distribution, and functional diversity make it difficult to quantify accurately using conventional light microscopy. HiExM increases effective spatial resolution while preserving compatibility with large-scale image acquisition, enabling robust segmentation and quantitative profiling of individual LELs across large cell populations. Using this pipeline, we identified differences in intracellular trafficking behavior among anti-transferrin receptor antibodies that could not be captured by conventional colocalization analysis alone. We further integrate spatial and morphological features with learned image-based representations that can define relationships between LEL morphology and subcellular position as well as how these relationships respond to perturbations. Together, our work establishes HiExM as a generalizable platform for scalable single-organelle profiling, enabling an analytical framework for quantifying discrete organelles across cells and conditions.

13

DNA-FISH Metaphase Spreads to Distinguish Extrachromosomal DNA from Homogeneously Staining Regions in Human Cancer Cell Lines

Masters, L. M.; Hagstrom, K. M.; Erwin, G. S.

2026-07-08 cancer biology 10.64898/2026.07.07.735342 medRxiv

Top 0.2%

1.2%

Show abstract

Whole-genome sequencing identifies focal DNA amplifications with base-pair resolution but cannot determine whether amplified sequences reside on extrachromosomal DNA (ecDNA, also known as double minutes) or within chromosomally integrated homogeneously staining regions (HSRs). DNA fluorescence in situ hybridization (DNA-FISH) metaphase spreads remain the gold standard for distinguishing these amplification states at single-cell resolution. Here, we present a detailed protocol for DNA-FISH metaphase spreads using human cancer cell lines, encompassing cell culture, metaphase arrest, hypotonic treatment, fixation, chromosome spreading, fluorescent probe hybridization, and fluorescence imaging. The protocol incorporates intermediate quality-control steps to verify successful chromosome dispersion and optimize metaphase spread quality, making the workflow accessible to laboratories without specialized cytogenetics expertise. Results demonstrate clear visualization of ecDNA and HSR amplification states using locus-specific probes and illustrate common technical artifacts that can affect interpretation. This protocol provides a robust and reproducible approach for studying the structural organization of oncogene amplification in cancer cells.

14

Unveiling Cerebrospinal Fluid Protein Biomarkers in Pediatric Acute Lymphoblastic Leukemia Using Proximity Extension Assay

Moballegh Nasery, M.; Gergely, R.; Kutszegi, N.; Szegedi, I.; Erdelyi, D. J.; Kiss, C.; Csosz, E.

2026-07-03 biochemistry 10.64898/2026.07.03.736065 medRxiv

Top 0.2%

1.1%

Show abstract

Abstract Background: Acute Lymphoblastic Leukemia (ALL) is a highly heterogeneous pediatric malignancy. Despite high survival rates, relapse and the involvement of central nervous system (CNS) remains a significant clinical challenge. Traditional clinical parameters often lack the precision required for early detection and risk stratification. This study utilizes high-throughput proteomics and machine learning to identify molecular signatures in cerebrospinal fluid (CSF) that characterize disease effect and treatment response. Methods: 82 CSF samples from 41 pediatric ALL patients at diagnosis (VD) and remission (VR) were analyzed. Proteomic profiling of 276 proteins was performed using Olink Proximity Extension Assay. Differentially abundant proteins were identified (q-value< 0.05, |Log_2FC| > 0.5) using the Wilcoxon rank-sum test. Three machine-learning algorithms - Random Forest, LASSO, and SVM-RFE - were integrated to select the differentially abundant proteins in VR and VD and between CNS involvement levels. To validate the data Pan-Cancer Atlas analysis was done using two different platforms. Results: In the remission phase, we observed significant alterations in the expression of key proteins compared to diagnosis, with ADGRG1 and KYNU showing a marked increase, while CCL17, CD5, CD27, CXCL9, CXCL11, FASLG, GZMA, and TNFRSF9 were significantly downregulated. Furthermore, our analysis identified distinct protein signatures associated with CNS involvement: CCL4, CTSC, CXCL10, CXCL9, and MMP7 were differentially abundant at the VD stage, whereas CAIX, CASP-8, HAGH, CXCL9, MMP7, MCP-2, and VWC2 at the VR stage. Conclusion: Integrating Olink proteomics with machine learning identified molecular signatures in ALL that have the potential to be further developed to a biomarker panel for monitoring treatment response and guiding personalized therapeutic strategies shifting the focus toward the Precision One Health approaches.

15

Semi-quantitative Classification of HIV-1 Nucleic Acids Using ResNet Image Analysis of Discretized Isothermal Amplification Reactions in a Microfluidic Chip

Martin, C.; Benson, N.; Gummalla, N.; Shimazu, K.; Bender, A.; Beck, D.; Posner, J.

2026-06-24 bioengineering 10.64898/2026.06.24.734232 medRxiv

Top 0.3%

1.1%

Show abstract

Isothermal nucleic acid amplification tests enable rapid and decentralized molecular diagnostics but often lack robust quantitative readouts compared to quantitative PCR. Here, we present a semi-quantitative nucleic acid measurement approach using machine learning to extract spatiotemporal features from real-time fluorescence imaging of rapid isothermal amplification reactions in microfluidic chips. A convolutional neural network was trained on multiple images sampled throughout a chip-based recombinase polymerase amplification reaction to classify samples into clinically relevant or logarithmically spaced concentration ranges spanning five orders of magnitude. The clinical classification model achieved 94.6% accuracy, and the logarithmic model achieved 92.7% accuracy, with most errors occurring between adjacent concentration categories. By learning spatiotemporal patterns of fluorescence development rather than relying on explicit feature extraction, the model remained accurate at both high and low nucleic acid concentration regimes where other quantitative isothermal molecular tests struggle. This approach enables automated interpretation of amplification reactions and extends the usable dynamic range of the assay. These results demonstrate that integrating machine learning with image-based amplification methods can support rapid semi-quantitative molecular testing and may facilitate broader deployment of nucleic acid diagnostics outside centralized laboratory settings. Author summaryMany rapid nucleic acid testing methods for infectious diseases are simple to run but struggle to measure how much genetic material is present, which limits their usefulness in clinical decision-making. In our work, we study a technique that produces visible fluorescent patterns during nucleic acid amplification reactions. Traditionally, the amount of nucleic acids present are measured by counting individual bright spots, but this becomes difficult when the target nucleic acid concentration is high and the spots merge together. We developed a machine learning approach that models how the fluorescence pattern changes over time. By analyzing a sequence of images from each reaction, our model can assign samples to concentration ranges across a wide span. This allows us to extract meaningful information even when traditional analysis methods break down. Because this approach works with simple imaging systems and does not require complex equipment, it could help support more informative and accessible diagnostic testing in point-of-care and low-resource settings.

16

Classpose drives the discovery of colorectal cancer phenotypes in clinical grade whole slide images

Mandal, S.; de Almeida, J. G.; Bräutigam, K.; Papanikolaou, N.; Graham, T. A.

2026-07-10 bioinformatics 10.64898/2025.12.18.695211 medRxiv

Top 0.3%

1.0%

Show abstract

Cell phenotyping in histopathology samples is essential for diagnostic and research workflows. However, human expert annotation requires significant time and expertise while being affected by inter-observer variability. Here, we present Classpose, an easily trainable framework for cell segmenting and phenotyping built on top of Cellpose-SAM with state-of-the-art performance across 6 distinct datasets, outperforming competing methods. We show that this requires fine-tuning the entire network, highlighting how instance segmentation is a poor objective for downstream cellular classification. We apply it to a large whole slide image (WSI) colorectal cancer (CRC) cohort (SurGen) and show that Classpose-derived cellular organisation and morphology features can be used to determine novel spatial morphological phenotypes for clinically relevant molecular conditions (MMR deficiency, BRAF mutations, KRAS mutations) and to predict these same molecular conditions. We make Classpose models available and provide a user-friendly QuPath extension for widespread use by the digital pathology community.

17

Rapid, Comprehensive Methylation-Based Classification of Hematologic Malignancies by Nanopore Sequencing

Achterberg, T.; Vermeulen, C.; van der Ent, H.; Jongmans, M.; Cammel, K.; de Ruijter, E.; Groenewegen, N.; Kranenburg, C.; van Tuil, M.; Waanders, E.; Parihar, M.; Islam, R.; Aijaz, J.; Goemans, B.; Calkoen, F.; van der Sluis, I.; den Boer, M. L.; Boer, J. M.; de Haas, V.; Triche, T.; Alexander, T. B.; Wang, J. R.; Bhakta, N.; Pieters, R.; Kester, L.; Tops, B.; de Ridder, J.

2026-07-02 hematology 10.64898/2026.07.02.26356825 medRxiv

Top 0.3%

0.9%

Show abstract

Hematologic malignancies are diagnosed through a fragmented, sequential workup of morphology, immunophenotyping, cytogenetics, and molecular testing that can take days to weeks and is unavailable at many centers. DNA methylation profiling has transformed central nervous system tumor diagnosis, yet hematologic classifiers have remained confined to narrow acute leukemia panels. Here we present Lamprey, a deep-learning methylation classifier spanning 86 hematologic malignancy entities, trained on a reference cohort of 8,544 patients and deployed directly from nanopore sequencing. A depth-aware training framework allows confident classification from the first minutes of a run. Against blinded integrated reference diagnoses across retrospective, external, and prospective cohorts, Lamprey exceeded 98% accuracy among classified cases. Lamprey reaches a confident call within minutes, and cost as little as $82 per sample. Lamprey consolidates a sequential diagnostic workup into a single, rapid, same-day molecular readout.

18

Synergistic effects of deleting the tyrosine phosphatases Shp1 and Shp2 on megakaryopoiesis and thrombopoiesis in mice

Barre, E.; Lourenco-Rodrigues, M.-D.; Zimmermann, L.; Pugliano, M.; Loubiere, C.; Proamer, F.; Rinckel, J.-Y.; Eckly, A.; Qu, Z.; Miao, J.; Zhang, Z.-Y.; Senis, Y. A.; Mazharian, A.

2026-07-10 cell biology 10.1101/2025.10.24.684367 medRxiv

Top 0.4%

0.5%

Show abstract

The Src homology 2 (SH2) domain-containing non-transmembrane protein-tyrosine phosphatases 1 and 2 (Shp1 and Shp2) have been implicated in regulating signaling from a variety of receptors and cell types, including the thrombopoietin (Tpo) receptor Mpl in megakaryocytes (MKs) and platelets. We previously showed that deletion of Shp1 and Shp2 in the MK/platelet lineage in mice using the Pf4-Cre transgene/loxP system impairs megakaryopoiesis and thrombopoiesis. However, we also observed unexpected phenotypes including a motheaten-like phenotype in Shp1-deficient mice and severe myelofibrosis in mice lacking both phosphatases. To determine whether these were lineage-specific effects, we utilized the Gp1ba-Cre transgenic mouse to delete loxP-flanked Shp1 and Shp2 in mice. Bone marrow-derived MKs from these mice expressed approximately 20-25% of Shp1 and Shp2, whereas platelets contain 5-10% of each phosphatase compared with controls. Minor MK/platelet defects were observed in mice lacking either Shp1 or Shp2 alone, however mice lacking both Shp1 and Shp2 exhibited macrothrombocytopenia, mild bleeding following tail injury, and impaired GPVI-mediated platelet aggregation and Syk phosphorylation, associated with reduction GPVI and integrin 2 subunit expression. Reduced Shp1 and Shp2 expression resulting in a significant reduction in ploidy, a block in MK maturation and proplatelet-producing MKs. Tpo-mediated Ras/MAPK signaling was reduced in Shp1/2-deficient MKs. Treatment of MKs with structurally distinct Shp2 allosteric inhibitors recapitulated key aspects of the Shp2-deficient phenotype, including aberrant megakaryopoiesis and reduced Mpl signaling. Our study highlights the synergistic functions of Shp1 and Shp2 in the MK/platelet lineage, and identifies Shp2 as a potential therapeutic target in myeloproliferative neoplasms. Key PointsO_LIDeletion of Shp1 and Shp2 in the MK/platelet lineage in mice results in macrothrombocytopenia and minor effects on platelet function. C_LIO_LIDefects can be partially explained by reduced Mpl signaling and aberrant megakaryopoiesis in the absence of Shp2 activity. C_LI

19

Segmentation and classification of retinal pigment granules in fluorescence lifetime imaging microscopy (FLIM) data

Ali, M.; Ahmad, H. A.; Alderzy, H.; Hammer, M.; Heintzmann, R.; Stranik, O.

2026-07-03 bioinformatics 10.64898/2026.06.29.735375 medRxiv

Top 0.4%

0.5%

Show abstract

Alterations of fluorescence properties in retinal pigment epithelium (RPE) cells caused by diseases such as age-related macular degeneration (AMD) highlight the need for detailed analysis of the fluorescent RPE granules at the individual level. Precise segmentation and classification of these granules remain challenging due to their limited visual separability. In this study, we present Classi4RPE, a computational algorithm designed to accurately segment RPE granules and classify them into three categories -- lipofuscin (L), melanolipofuscin (ML), and melanin (M) -- based on fluorescence lifetime imaging data, which provide distinctive contrast. The method is implemented in a custom Python framework and employs seeded watershed segmentation to isolate individual granules. Lipofuscin granules are identified as hyperfluorescent structures with longer lifetimes, while granules with shorter lifetimes are further analyzed based on their spatial lifetime distribution from the center to edge, enabling discrimination of ML from other melanin-rich granules. Our approach achieves high performance, with mean sensitivities of 0.99 for L granules and 0.90 for ML granules, and corresponding specificities of 0.93 and 0.98, respectively, compared to manually annotated ground truth. These results demonstrate the potential of Classi4RPE to surpass human visual limitations and provide a robust tool for quantitative RPE analysis.

20

netPCF: Geometry-Aware Pair Correlation Functions for Spatial Biology

Moore, J. W.; Bull, J. A.; Byrne, H. M.

2026-07-07 bioinformatics 10.64898/2026.07.02.736020 medRxiv

Top 0.4%

0.5%

Show abstract

Spatial organisation is a defining feature of biological systems, underpinning cellular interactions, tissue function, disease progression and therapeutic response. Identifying and quantifying spatial organisation may require methods that resolve relationships across spatial scales. The pair correlation function (PCF) quantifies spatial dependence between points across multiple length scales, but its standard Euclidean formulation is poorly suited to data defined on irregular, curved or otherwise structured domains, where tissue geometry may constrain biological organisation and distort Euclidean distances. Here, we introduce netPCF, a geometry-aware extension of the PCF for quantifying spatial organisation on complex biological domains. By representing tissue structures, anatomical surfaces and other constrained geometries as spatial networks, netPCF generalises the PCF beyond extrinsic Euclidean settings. The framework derives the expected behaviour of the statistic under complete spatial randomness using interpretable finite-support kernels, provides bootstrap-based uncertainty quantification, and includes practical criteria for assessing domain discretisation adequacy. We further extend netPCF to marked (labelled) biological data using feature kernels for categorical and continuous attributes, enabling unified analysis of cell identities, marker intensities, phenotypic states, gene expression and other quantitative features on structured domains in any spatial dimension. All methods are implemented in the open-source Python package spacenet. Synthetic studies show that netPCF recovers classical Euclidean behaviour on sufficiently resolved networks and is robust to common imaging noise. We demonstrate its utility in two biological applications. In three-dimensional imaging mass cytometry data from HER2+ breast carcinoma, netPCF separates tissue architecture-driven proximity from biologically meaningful endothelial and immune cell organisation. In reconstructed surfaces of developing murine embryos, netPCF identifies a transition in the Wnt1-Wnt6 relationship from short-range co-localisation at E9.5 to spatial exclusion at E11.5, a pattern of ectodermal boundary refinement not captured by prior voxel-wise co-expression analysis. Overall, netPCF provides a statistically grounded and practical framework for quantifying spatial organisation on complex biological domains.